11 research outputs found

    Selection itérative de transformations pour la classification d'images

    Get PDF
    National audienceEn classification d'images, une stratĂ©gie efficace pour apprendre un classifieur invariant Ă  certaines transformations consiste Ă  augmenter l'Ă©chantillon d'apprentissage par le mĂȘme ensemble d'exemples mais auxquels les transformations ont Ă©tĂ© appliquĂ©es. NĂ©anmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avĂ©rer difficile de sĂ©lectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'Ă©chantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le mĂȘme impact sur la performance ; certains peuvent mĂȘme dĂ©grader la performance. Nous proposons un algorithme de sĂ©lection automatique de transformations : Ă  chaque itĂ©ration, la transformation qui donne le plus grand gain en performance est sĂ©lectionnĂ©e. Nous Ă©valuons notre approche sur les images de la compĂ©tition ImageNet 2010 et amĂ©liorons la performance en top-5 accuracy de 70.1% Ă  74.9%

    Selection itérative de transformations pour la classification d'images

    No full text
    National audienceEn classification d'images, une stratĂ©gie efficace pour apprendre un classifieur invariant Ă  certaines transformations consiste Ă  augmenter l'Ă©chantillon d'apprentissage par le mĂȘme ensemble d'exemples mais auxquels les transformations ont Ă©tĂ© appliquĂ©es. NĂ©anmoins, lorsque l'ensemble des transformations possibles est grand, il peut s'avĂ©rer difficile de sĂ©lectionner un petit nombre de transformations pertinentes parmi elles tout en conservant une taille d'Ă©chantillon d'apprentissage raisonnable. optimal. En effet, toutes les transformations n'apportent pas le mĂȘme impact sur la performance ; certains peuvent mĂȘme dĂ©grader la performance. Nous proposons un algorithme de sĂ©lection automatique de transformations : Ă  chaque itĂ©ration, la transformation qui donne le plus grand gain en performance est sĂ©lectionnĂ©e. Nous Ă©valuons notre approche sur les images de la compĂ©tition ImageNet 2010 et amĂ©liorons la performance en top-5 accuracy de 70.1% Ă  74.9%

    Convolutional Patch Representations for Image Retrieval: an Unsupervised Approach

    Get PDF
    International audienceConvolutional neural networks (CNNs) have recently received a lot of attention due to their ability to model local stationary structures in natural images in a multi-scale fashion, when learning all model parameters with supervision. While excellent performance was achieved for image classification when large amounts of labeled visual data are available, their success for un-supervised tasks such as image retrieval has been moderate so far. Our paper focuses on this latter setting and explores several methods for learning patch descriptors without supervision with application to matching and instance-level retrieval. To that effect, we propose a new family of convolutional descriptors for patch representation , based on the recently introduced convolutional kernel networks. We show that our descriptor, named Patch-CKN, performs better than SIFT as well as other convolutional networks learned by artificially introducing supervision and is significantly faster to train. To demonstrate its effectiveness, we perform an extensive evaluation on standard benchmarks for patch and image retrieval where we obtain state-of-the-art results. We also introduce a new dataset called RomePatches, which allows to simultaneously study descriptor performance for patch and image retrieval

    Transformation Pursuit for Image Classification

    Get PDF
    International audienceA simple approach to learning invariances in image classification consists in augmenting the training set with transformed versions of the original images. However, given a large set of possible transformations, selecting a compact subset is challenging. Indeed, all transformations are not equally informative and adding uninformative transformations increases training time with no gain in accuracy. We propose a principled algorithm – Image Transformation Pursuit (ITP) – for the automatic selection of a compact set of transformations. ITP works in a greedy fashion, by selecting at each iteration the one that yields the highest accuracy gain. ITP also allows to efficiently explore complex transformations, that combine basic transformations. We report results on two public benchmarks: the CUB dataset of bird images and the ImageNet 2010 challenge. Using Fisher Vector representations, we achieve an improvement from 28.2% to 45.2% in top-1 accuracy on CUB, and an improvement from 70.1% to 74.9% in top-5 accuracy on ImageNet. We also show significant improvements for deep convnet features: from 47.3% to 55.4% on CUB and from 77.9% to 81.4% on ImageNet

    Large-scale image classification with trace-norm regularization

    Get PDF
    International audienceWith the advent of larger image classification datasets such as ImageNet, designing scalable and efficient multi-class classification algorithms is now an important challenge. We introduce a new scalable learning algorithm for large-scale multi-class image classification, based on the multinomial logistic loss and the trace-norm regularization penalty. Reframing the challenging non-smooth optimization problem into a surrogate infinite-dimensional optimization problem with a regular l1 -regularization penalty, we propose a simple and provably efficient accelerated coordinate descent algorithm. Furthermore, we show how to perform efficient matrix computations in the compressed domain for quantized dense visual features, scaling up to 100,000s examples, 1,000s-dimensional features, and 100s of categories. Promising experimental results on the "Fungus", "Ungulate", and "Vehicles" subsets of ImageNet are presented, where we show that our approach performs significantly better than state-of-the-art approaches for Fisher vectors with 16 Gaussians

    The INRIA-LIM-VocR and AXES submissions to Trecvid 2014 Multimedia Event Detection

    Get PDF
    -This paper describes our participation to the 2014 edition of the TrecVid Multimedia Event Detection task. Our system is based on a collection of local visual and audio descriptors, which are aggregated to global descriptors, one for each type of low-level descriptor, using Fisher vectors. Besides these features, we use two features based on convolutional networks: one for the visual channel, and one for the audio channel. Additional high-level featuresare extracted using ASR and OCR features. Finally, we used mid-level attribute features based on object and action detectors trained on external datasets. Our two submissions (INRIA-LIM-VocR and AXES) are identical interms of all the components, except for the ASR system that is used. We present an overview of the features andthe classification techniques, and experimentally evaluate our system on TrecVid MED 2011 data

    De l'apprentissage de représentations visuelles robustes aux invariances pour la classification et la recherche d'images

    Get PDF
    This dissertation focuses on designing image recognition systems which are robust to geometric variability. Image understanding is a difficult problem, as images are two-dimensional projections of 3D objects, and representations that must fall into the same category, for instance objects of the same class in classification can display significant differences. Our goal is to make systems robust to the right amount of deformations, this amount being automatically determined from data. Our contributions are twofolds. We show how to use virtual examples to enforce robustness in image classification systems and we propose a framework to learn robust low-level descriptors for image retrieval. We first focus on virtual examples, as transformation of real ones. One image generates a set of descriptors –one for each transformation– and we show that data augmentation, ie considering them all as iid samples, is the best performing method to use them, provided a voting stage with the transformed descriptors is conducted at test time. Because transformations have various levels of information, can be redundant, and can even be harmful to performance, we propose a new algorithm able to select a set of transformations, while maximizing classification accuracy. We show that a small amount of transformations is enough to considerably improve performance for this task. We also show how virtual examples can replace real ones for a reduced annotation cost. We report good performance on standard fine-grained classification datasets. In a second part, we aim at improving the local region descriptors used in image retrieval and in particular to propose an alternative to the popular SIFT descriptor. We propose new convolutional descriptors, called patch-CKN, which are learned without supervision. We introduce a linked patch- and image-retrieval dataset based on structure from motion of web-crawled images, and design a method to accurately test the performance of local descriptors at patch and image levels. Our approach outperforms both SIFT and all tested approaches with convolutional architectures on our patch and image benchmarks, as well as several styate-of-theart datasets.Ce mĂ©moire de thĂšse porte sur l’élaboration de systĂšmes de reconnaissance d’image qui sont robustes Ă  la variabilitĂ© gĂ©omĂ©trique. La comprĂ©hension d’une image est un problĂšme difficile, de par le fait qu’elles sont des projections en deux dimensions d’objets 3D. Par ailleurs, des reprĂ©sentations qui doivent appartenir Ă  la mĂȘme catĂ©gorie, par exemple des objets de la mĂȘme classe en classification, peuvent ĂȘtre visuellement trĂšs diffĂ©rentes. Notre but est de rendre ces systĂšmes robustes Ă  la juste quantitĂ© de dĂ©formations, celle-ci Ă©tant automatiquement dĂ©terminĂ©e Ă  partir des donnĂ©es. Nos deux contributions sont les suivantes. Nous montrons tout d’abord comment utiliser des exemples virtuels pour rendre les systĂšmes de classification d’images robustes et nous proposons ensuite une mĂ©thodologie pour apprendre des descripteurs de bas niveau robustes, pour la recherche d’image.Nous Ă©tudions tout d’abord les exemples virtuels, en tant que transformations de vrais exemples. En reprĂ©sentant une image en tant que sac de descripteurs transformĂ©s, nous montrons que l’augmentation de donnĂ©es, c’est-Ă -dire le fait de les considĂ©rer comme de nouveaux exemples iid, est la meilleure maniĂšre de les utiliser, pourvu qu’une Ă©tape de vote avec les descripteurs transformĂ©s soit opĂ©rĂ©e lors du test. Du fait que les transformations apportent diffĂ©rents niveaux d’information, peuvent ĂȘtre redondants, voire nuire Ă  la performance, nous pro-posons un nouvel algorithme capable de sĂ©lectionner un petit nombre d’entre elles,en maximisant la justesse de classification. Nous montrons par ailleurs comment remplacer de vrais exemples par des virtuels, pour allĂ©ger les couts d’annotation.Nous rapportons de bons rĂ©sultats sur des bancs d’essai de classification.Notre seconde contribution vise Ă  amĂ©liorer les descripteurs de rĂ©gions locales utilisĂ©s en recherche d’image, et en particulier nous proposons une alternative au populaire descripteur SIFT. Nous proposons un nouveau descripteur, appelĂ© patch-CKN, appris sans supervision. Nous introduisons un nouvel ensemble de donnĂ©es liant les images et les imagettes, construit Ă  partir de reconstruction3D automatique d’images rĂ©cupĂ©rĂ©es sur Internet. Nous dĂ©finissons une mĂ©thode pour tester prĂ©cisĂ©ment la performance des descripteurs locaux au niveau de l’imagette et de l’image. Notre approche dĂ©passe SIFT et les autres approches Ă  base d’architectures convolutionnelles sur notre banc d’essai, et d’autres couramment utilisĂ©s dans la littĂ©rature

    Of Learning Visual Representations Robust to Invariances for Image Classification and Retrieval

    No full text
    Ce mĂ©moire de thĂšse porte sur l’élaboration de systĂšmes de reconnaissance d’image qui sont robustes Ă  la variabilitĂ© gĂ©omĂ©trique. La comprĂ©hension d’une image est un problĂšme difficile, de par le fait qu’elles sont des projections en deux dimensions d’objets 3D. Par ailleurs, des reprĂ©sentations qui doivent appartenir Ă  la mĂȘme catĂ©gorie, par exemple des objets de la mĂȘme classe en classification, peuvent ĂȘtre visuellement trĂšs diffĂ©rentes. Notre but est de rendre ces systĂšmes robustes Ă  la juste quantitĂ© de dĂ©formations, celle-ci Ă©tant automatiquement dĂ©terminĂ©e Ă  partir des donnĂ©es. Nos deux contributions sont les suivantes. Nous montrons tout d’abord comment utiliser des exemples virtuels pour rendre les systĂšmes de classification d’images robustes et nous proposons ensuite une mĂ©thodologie pour apprendre des descripteurs de bas niveau robustes, pour la recherche d’image.Nous Ă©tudions tout d’abord les exemples virtuels, en tant que transformations de vrais exemples. En reprĂ©sentant une image en tant que sac de descripteurs transformĂ©s, nous montrons que l’augmentation de donnĂ©es, c’est-Ă -dire le fait de les considĂ©rer comme de nouveaux exemples iid, est la meilleure maniĂšre de les utiliser, pourvu qu’une Ă©tape de vote avec les descripteurs transformĂ©s soit opĂ©rĂ©e lors du test. Du fait que les transformations apportent diffĂ©rents niveaux d’information, peuvent ĂȘtre redondants, voire nuire Ă  la performance, nous pro-posons un nouvel algorithme capable de sĂ©lectionner un petit nombre d’entre elles,en maximisant la justesse de classification. Nous montrons par ailleurs comment remplacer de vrais exemples par des virtuels, pour allĂ©ger les couts d’annotation.Nous rapportons de bons rĂ©sultats sur des bancs d’essai de classification.Notre seconde contribution vise Ă  amĂ©liorer les descripteurs de rĂ©gions locales utilisĂ©s en recherche d’image, et en particulier nous proposons une alternative au populaire descripteur SIFT. Nous proposons un nouveau descripteur, appelĂ© patch-CKN, appris sans supervision. Nous introduisons un nouvel ensemble de donnĂ©es liant les images et les imagettes, construit Ă  partir de reconstruction3D automatique d’images rĂ©cupĂ©rĂ©es sur Internet. Nous dĂ©finissons une mĂ©thode pour tester prĂ©cisĂ©ment la performance des descripteurs locaux au niveau de l’imagette et de l’image. Notre approche dĂ©passe SIFT et les autres approches Ă  base d’architectures convolutionnelles sur notre banc d’essai, et d’autres couramment utilisĂ©s dans la littĂ©rature.This dissertation focuses on designing image recognition systems which are robust to geometric variability. Image understanding is a difficult problem, as images are two-dimensional projections of 3D objects, and representations that must fall into the same category, for instance objects of the same class in classification can display significant differences. Our goal is to make systems robust to the right amount of deformations, this amount being automatically determined from data. Our contributions are twofolds. We show how to use virtual examples to enforce robustness in image classification systems and we propose a framework to learn robust low-level descriptors for image retrieval. We first focus on virtual examples, as transformation of real ones. One image generates a set of descriptors –one for each transformation– and we show that data augmentation, ie considering them all as iid samples, is the best performing method to use them, provided a voting stage with the transformed descriptors is conducted at test time. Because transformations have various levels of information, can be redundant, and can even be harmful to performance, we propose a new algorithm able to select a set of transformations, while maximizing classification accuracy. We show that a small amount of transformations is enough to considerably improve performance for this task. We also show how virtual examples can replace real ones for a reduced annotation cost. We report good performance on standard fine-grained classification datasets. In a second part, we aim at improving the local region descriptors used in image retrieval and in particular to propose an alternative to the popular SIFT descriptor. We propose new convolutional descriptors, called patch-CKN, which are learned without supervision. We introduce a linked patch- and image-retrieval dataset based on structure from motion of web-crawled images, and design a method to accurately test the performance of local descriptors at patch and image levels. Our approach outperforms both SIFT and all tested approaches with convolutional architectures on our patch and image benchmarks, as well as several styate-of-theart datasets

    Bayes Risk for Large Scale Hierarchical Top-K Image Classification

    No full text

    Local Convolutional Features with Unsupervised Training for Image Retrieval

    Get PDF
    International audiencePatch-level descriptors underlie several important computer vision tasks, such as stereo-matching or content-based image retrieval. We introduce a deep convolutional architecture that yields patch-level descriptors, as an alternative to the popular SIFT descriptor for image retrieval. The proposed family of descriptors, called Patch-CKN, adapt the recently introduced Convolutional Kernel Network (CKN), an unsupervised framework to learn convolutional architectures. We present a comparison framework to benchmark current deep convolutional approaches along with Patch-CKN for both patch and image retrieval, including our novel ``RomePatches'' dataset. Patch-CKN descriptors yield competitive results compared to supervised CNNs alternatives on patch and image retrieval
    corecore